Connects the Variables
As described earlier, correlation assesses the relationship between two continuous numeric variables
(as compared to categorical variables, as described in Chapter 8). This relationship can also be
evaluated with regression analysis to provide more information about how these two variables are
related. But perhaps more importantly, regression is not limited to continuous variables, nor is it
limited to only two variables. Regression is about developing a formula that explains how all the
variables in the regression are related. In the following sections, we explain the purpose of regression
analysis, identify some terms and notation typically used, and describe common types of regression.
Understanding the purpose of regression analysis
You may wonder how fitting a formula to a set of data can be useful. There are actually many uses.
With regression, you can
Test for a significant association or relationship between two or more variables. The process
is similar to correlation, but is more generalized to produce a unique equation or formula relating
to the variables.
Get a compact representation of your data. A well-fitting regression model succinctly
summarizes the relationships between the variables in your data.
Make precise predictions, or prognoses. With a properly fitted survival function (see Chapter
23), you can generate a customized survival curve for a newly diagnosed cancer patient based on
that patient’s age, gender, weight, disease stage, tumor grade, and other factors to predict how long
they will live. A bit morbid, perhaps, but you could certainly do it.
Do mathematical manipulations easily and accurately on a fitted function that may be difficult
or inaccurate to do graphically on the raw data. These include making estimates within the range
of the measured values (called interpolation) as well as outside the measured values (called
extrapolation, and considered risky). You may also want to smooth the data, which is described in
Chapter 19.
Obtain numerical values for the parameters that appear in the regression model
formula.Chapter 19 explains how to make a regression model based on a theoretical rather than
known statistical distribution (described in Chapter 3). Such a model is used to develop estimates
like the ED50 of a drug, which is the dose that produces one-half the maximum effect.
Talking about terminology and mathematical notation
A regression model is a formula that describes how one variable, the dependent variable,
depends on one or more other variables, and on one or more parameters. (While it is technically
possible to have more than one dependent variable in a model, a discussion of this type of
regression is outside the scope of this book.) The dependent variable is also called the outcome,
and the other variables are called independent variables or predictors. Parameters refer to the
other terms that appear in the formula that make the function come as close as possible to the
observed data which are determined by the statistical software you are using.